Automatic Segmentation of the IAM Off-line Database for Handwritten English Text
نویسندگان
چکیده
This paper presents an automatic segmentation scheme for cursive handwritten text lines using the transcriptions of the text lines and a hidden Markov model (HMM) based recognition system. The segmentation scheme has been developed and tested on the IAM database that contains offline images of cursively handwritten English text. The original version of this database contains ground truth for complete lines of text only, but not for individual words. With the method described in this paper, the usability of the database is greatly improved because accurate bounding box information and ground truth for individual words (including punctuation characters) is now available as well. Applying the segmentation scheme on 417 pages of handwritten text a correct word segmentation rate of 98% has been achieved, producing correct bounding boxes for over 25’000 handwritten words.
منابع مشابه
Using an artificial neural network approach for off-line sentence segmentation
This paper works with an Artificial Neural Network (ANN) architecture to segment unconstrained English handwriting sentences into single words. The ANN receives a feature set of the handwritten text line and classifies each image’s column belonging to a word or a gap between words. As result, the sequences of columns with the same classification represent the segmented words or inter-word gaps....
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملAn Integrated System for Handwritten Document Image Processing
In this paper we attempt to face common problems of handwritten documents such as nonparallel text lines in a page, hill and dale writing, slanted and connected characters. Towards this end an integrated system for document image preprocessing is presented. This system consists of the following modules: skew angle estimation and correction, line and word segmentation, slope and slant correction...
متن کاملOff-line Cursive Handwritten Word Segmentation, A new approach
The segmentation of off-line cursive handwritten word is an important step in cursive handwriting recognition. In this paper a new, simple yet effective approach is proposed. Proposed technique is based on the analysis of the ligatures of the characters in the cursive word. The only preprocessing is to skeleton the word to allow variations in pen thickness and tilt in writing. There is no const...
متن کامل